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(57) Abstract: A method for the automated analysis of 
digital images, particularly for the purpose of assessing 
mitotic activity from images of histological slides for 
prognostication of breast cancer. The method includes 
the steps of identifying the locations of objects within 
the image which have intensity and size characteristics 
consistent with mitotic epithelial cell nuclei, taking the 
darkest 10 % of those objects, deriving contours indicat- 
ing their boundary shape, and smoothing and measuring 
the curvature around the boundaries using a Probability 
Density Association Filter (PDAF). The PDAF output 
is used to compute a measure of any concavity of the 
boundary - a good indicator of mitosis. Objects are fi- 
nally classified as representing mitotic nuclei or not, as a 
function of boundary concavity and mean intensity, by 
use of a Fisher classifier trained on known examples. 
Other uses for the method could include the analysis of 
images of soil samples containing certain types of seeds 
or other particles. 
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Image Analysis 



FIELD OF THE INVENTION 

The present invention relates to the automated analysis of digital images. It is 
more particularly concerned with the automated identification of mitotic activity in 

5 digital images of histological or cytology specimens and most particularly for the 
purpose of assessing the presence and severity of cancer in breast tissue, and it is 
in this context that the invention is principally described herein. The invention 
may, however, also find application in the assessment of other forms of cancer, 
such as colon and cervical cancer, and in the analysis of various other kinds of 

10 structure presenting image components which are amenable to identification in a 
similar way, for example in the analysis of soil samples containing certain types of 
seeds or other particles. 

BACKGROUND AND SUMMARY OF THE INVENTION 

15 

Many thousands of women die needlessly each year from breast cancer, a cancer 
from which there Is theoretically a high probability of survival if detected sufficiently 
early. If the presence of cancerous tissue is missed in a sample, then, .by the time 
the next test Is undertaken, the cancer may have progressed and the chance of 
20 survival significantly reduced. The importance of detecting cancerous tissue In the 
samples can therefore not be over-emphasised. 

A typical national breast screening programme uses mammography for the early 
detection of impalpable lesions. Once a lesion indicative of breast cancer is 

25 detected, then tissue samples are taken and examined By a fraihed 

histopathologist to establish a diagnosis and prognosis. More particularly, one of 
the principal prognostic factors for breast cancer is the extent of mitotic activity, 
that is to say the degree of epithelial cell division that is taking place. A 
histopathologlcal slide Is effectively a "snapshot" representing a very short time 

30 IntefvraT In a" cell divisfdn prodess, so the chance of a particular slide stiowing a 

particular phase of mitotic activity is very small; if such a phase is in fact present in 
a slide, that is a good indicator of how fast a potential tumour Is growing. 
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In the existing manual procedure for scoring mitotic activity a liistopathologist 
places a slide under a microscope and examines a region of it (referred to as a tile) 
at a magnification of x40 for indications of mitoses. Typically ten different tiles 
from the tissue sample are examined and a total count is made of the number of 
5 . cell divisions which, in the histopathologist's opinion, are seen to be tal<ing place in 
the ten tiles. This is then converted to an indication of cancer grade typically in 
accordance with the following table: 

Number of Mitotic Cells Cancer Grade 

per Ten Tiles 

0 to N Grade 1 

(N + 1)toM Grade 2 

> M Grade 3 



10 where Grade 1 is the least serious and Grade 3 is the most serious. Values of N 
and M are typically 5 and 10 but will vary in different schemes depending on the 
size of the tiles being observed. 

This is, however, a time consuming, labour Intensive and expensive process. 

15 Qualification to perform such examination is not easy to obtain and requires 
frequent review. The examination itself requires the interpretation of colour 
images by eye, a highly subjective process characterised by considerable 
variations in both inter, and intra-observer analysis, i.e. variances in observation 
may occur for the same sample by different histopathologists, and by the same 

20 histopathologist at different times. For example, studies have shown that two 
different histopathologists examining the same ten samples may give different 
opinions on three of them, an en^or of 30%. This problem Is exacerbated by the 
complexity of some samples, espedally in marginal cases where there may not be 
a defTnitive conclusion. If sufTicient trained staff are not available this impacts 

25 upon pressures to complete the analysis, potentially leading to erroneous 
assessments and delays In diagnosis. 

These problems mean that there are practical limitations on tiie extent and 
effectiveness of screening for breast cancer with the consequence that some 
30 women are not being correctly identified as having the disease and, on some 
occasions, this failure may result in premature death. Conversely, others are 
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being incorrectly diagnosed with breast cancer and are therefore undergoing 
potentially traumatic treatment unnecessarily. 

It is thus an aim of the invention to provide an automated method of image analysis 
5 which can be embodied in a robust, objective and cost-effective tool to assist in the 
diagnosis and prognosis of breast cancer, although as previously indicated the 
invention may also find application in other fields. 

In one aspect the invention accordingly resides in a method for the automated 
10 analysis of a digital image comprising an anray of pixels, including the steps of: 
identifying the locations of objects within the image which have specified Intensity 
and size characteristics; defining regions of specified extent within the image 
which contain respective said objects; deriving from the data within respective said 
regions one or more respective closed contours comprising points of equal 
15 intensities; and estimating the curvature of at least one respective said contour 
within respective said regions at least to produce a measure of any concavity 
thereof. 

As will be understood from the ensuing detailed description of a preferred 
20 embodiment, such a method is of use in identifying mitotic cell nuclei in digital 
images of histopathological slides. 

The invention also resides in apparatus for the automated analysis of a digital 
image comprising means to perform the foregoing method and in a computer 
25 program product comprising a computer readable medium having thereon 
computer program code means adapted to cause a computer to execute the 
foregoing method and in a computer program comprising Instructions so to do. 

These and other aspects of the invention will now be more particulariy described, 
30 by way of example, with reference to the accompanying drawings and in the 

context of an automated system for grading cancer on the basis of the numbers of 
mitotic epithelial cell nuclei in digital images of histopathological slides of potential 
carcinomas of the breast. 

35 BRIEF DESCRIPTION OF THE DRAWINGS 

In the drawings: 
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Figure 1 is a block diagram of an automated process in accordance with the 
invention for measuring mitotic activity for patient diagnosis; 

5 Figure 2 is a more detailed block diagram of the main stages in the mitosis 
detection and measurement block of Figure 1; 

Figures 3 and 4 are simplified visualisations of the contour selection stage of the 
process of Figure 2; and 

10 

Figure 5 Illustrates a decision boundary In a Fisher classifier as used in a later 
stage of the process of Figure 2. 

DETAILED DESCRIPTION 

15 

Figure 1 shows a process for the assessment of tissue samples In the form of 
histopathologlcal slides of potential carcinomas of the breast. The process 
measures mitotic activity of epithelial cells to produce a parameter for use by a 
pathologist as the basis for assessing patient diagnosis. It employs a database 1 , 
20 which maintains digitised image data obtained from histological slides. Sections 
are cut from breast tissue samples (biopsies), placed on respective slides and 
stained using the staining agent Haematoxylin & Eosin (H&E), which is a common 
stain for delineating tissue and cellular structure. 

25 To obtain the digitised image data for analysis, a histopathologist scans a slide 
under a microscope and at 40x magnification selects regions of the slide which 
appear to be most promising in terms of analysing mitotic activity. Eac^i of these 
regions Is then photographed using the microscope and a digital camera. In one 
example a Zeiss Axioskop microscope has been used with a Jenoptiks Progres 

30 3012 digital camera. This produces for each region a respective digitised image in 
three colours, i.e. red, green and blue (R, G & B). Respective Intensity values in 
the R, G and B image planes are thus obtained for each pixel in an an-ay. In the 
prefen-ed embodiment there is an eight-bit range of pixel intensities (values of 0 to 
255) for each colour and the an^y comprises 1476 pixels across by 1160 pixels 

35 down, with a pixel size of 220nm square. The image data is stored temporarily at 
1 for later use. Ten digitised images (electronic equivalents of tiles) are required 
for the detection and measurement of mitotic activity at 2, which then provides 
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input to a diagnostic report at 3. In principle the processing stages to be described 
in detail below can operate on a single waveband (R, G or B) image or a 
combination of them. In practice, however, the red waveband has been found to 
contain the most information for discriminating between mitotic and other ceils 
5 when stained with H&E, and is assumed to be used in the following description. 
The process can be performed in a suitably programmed personal computer (PC) 
or other general purpose computer of suitable processing power or in dedicated 
hardware. 

10 Figure 2 shows in more detail the processing stages comprised in the block 2 of 
Figure 1 . They are carried out for each of the ten digitised images refenred to 
above and will now be described for one such image (tile). To aid in the 
understanding of this description it is recalled that the aim of the process is to 
identify and count the number of mitotic epithelial cell nuclei (if any) in each tile. In 

15 images acquired as described above such nuclei generally appear darker than 
normal epithelial cell nuclei, and also have a different shape. Nomnal nuclei are 
generally convex with smooth boundaries while mitotic nuclei are more irregular in 
shape and have ragged boundaries. However, it is not always the case that 
mitotic epithelial cell nuclei are the dari<est objects in a given tile; for example 

20 stromal cells, lymphocytes and necrotic cells may be dari^er. Given the relatively 
low numbers of mitoses which may be present in any given tile and yet may 
indicate serious disease it is important that as many as possible are con-ectly 
identified while at the same time minimising the number of any normal cell nuclei or 
other objects incorrectly identified as mitotic. 

25 Location of candidate ceil nuclei 

Refem'ng now to Figure 2, the first processing stage 21 consists of locating all 
possible candidate cell nuclei. The approach adopted for identifying the locations of 
potential mitotic nuclei Is based on the fact that they are generally darker than 
average nuclei. Mitotic nuclei appear in the image as solid dark objects (i.e. dark all 
30 the way through) most of the time, or instead occasionally they form groups of 

small daric clumps. Hence the aim is to find concentrations of dark pixelst these are 
not necessarily connected groups of dark pixels, but a region containing a sufficient 
number of clustered dark pixels. 

There are various methods of doing this. One example is simple grey-level 
35 segmentation, where a threshold is chosen and only those pixels having grey- 
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levels below this threshold are selected. The drawback of this approach is that 
some mitotic nuclei are not particularly dark, but are only distinguishable from their 
shape characteristics. Choosing a threshold sufficiently low to detect such nuclei 
would yield an excess of clutter. 

5 The preferred approach is to use the multiresolution blob filtering described below. 
However, as will be apparent to those skilled in the image processing art, the 
present invention may be practised without employing this particular technique. 
Alternatives include the processes described as mitotic cueing in our copending 
United Kingdom patent application no. 0226787.0. The general principle is, given 

10 that the approximate size of the nuclei is known, to apply a radially-symmetric filter 
whose output is large in magnitude when there is a region of local brightness or 
dari<ness whose shape and size approximately matches that of the filter. This filter 
should be a difference filter with zero mean, so areas of constant intensity are 
suppressed. 

15 The method will now be described in temns of a specific implementation, namely a 
multi-scale blob filter as known e.g. firom "Multiresolution analysis of remotely 
sensed imager/', J.G.Jones, R.W.Thomas, P.G.Earwicker, Int J. Remote Sensing, 
1 991 , Vol 1 2, No 1 , pp 1 07-1 24. The process will be described for filtering the 
image using a particular size of blob filter, where these are defined at successive 

20 . octave (powers of 2) scale sizes. 

The recursive construction process for the multi-scale blob filter involves two filters; 
a 3x3 Laplacian blob filter (L) and a 3x3 smootiiing filter (s) as defined below. 
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These two filters form a basis for filtering over a set of octave scale sizes, 
25 according to the following process: 

To enhance blob-shaped objects at ttie original resolution (octave scale 1 , pixel 
size 1), the image is conflated wifti the 3x3 Laplacian filter alone: 



F,(.m,n) = ^ £/(m + i,n + J) * LQ + 2J + 2) 
,-=-ij— 1 
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where / is the original Image, and the range of the indices m and n is set such that 
the indices In the summation above are always within the original Image 
dimensions (so m and n start at 2). Values of the filtered Image at locations outside 
these ranges are set to zero. 

5 For computational efficiency, multiplications by ±1 need not be performed explicitly. 
Thus the filter output value for a pixel located at position (/j) Is given by: 

To enhance blob-shaped objects at a resolution one octave above the original 
10 (octave scale 2, pixel size 3), the image is first correlated with the 3x3 smoothing 
filter (s), forming a smoothed image (S2). The 3x3 Laplacian blob filter (L) is then 
expanded by a factor of two, by padding it with zeros, to form a 5x5 filter [-1 0-10- 
1; 0 0 0 0 0; -1 0 8 0 -1 ; 0 0 0 0 0; -1 0 -1 0 -1]. This is then correlated with the 
smoothed Image (S2) to fomi a filtered image (F2), but for computational efficiency, 
15 only the non-zero filter coefficients are used, thus: 

S2 (m,n) = 2 + i,n + j) * sQ + 2, 7 + 2) 

^2 «) = Z Z (m + 2U n + 2j) * Lii + 2, 7 + 2) 

where / Is the original image, and the range of the indices m and n Is set such that 
the indices in the summation above are always within the original Image 

dimensions (so m and n start at 4). Values of the filtered Image at locations outside 

.J 

20 these ranges are set to zero. 

The above double correlation is equivalent to a single correlation of the original 
image with a 7x7 filter formed from correlating the expanded 5x5 Laplacian with the 
3x3 smoothing filter, but this larger filter is never formed explicitly. 

To enhance blob-shaped objects at a resolution two octaves above the original 
25 (scale 3, pixel size 7), the smoothing filter Is expanded by a factor of 2 in the same 
manner as the Laplacian above, then correlated with the smoothed image (S2) 
above to give a lower-resolution smoothed image (S3) thus: 
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Following this, the 5x5 Laplaclan filter is expanded by a factor of 2 by padding with 
zeros to fonm a 9x9 filter, which Is conrelated with the smoothed image (S3) in the 
same computationally efficient manner, thus: 

F^i^^n) = 2] XS^(m + 4Un + 4j)^L(i-^2J + 2) 

5 This process is repeated to obtain results at successive octave scales, namely 
expanding both the smoothing filter and the Laplacian blob filter each time. 

The above process may be used to produce a "blob-filtered" image at any of the 
required octave scales. Objects of interest (i.e. clusters of dark pixels in this case) 
will have the greatest values of the magnitude of the filter output. The locally 
10 strongest filter output will occur at the centre of the object of interest. Individual 
objects (called "blobs") are now identified by finding local minima of the filter 
output, where each blob is assigned a position and intensity, the latter being the 
value of the filter output at the local minimum. 

In this application, objects which are dark relative to the background are identified 
15 by finding local minima of the filter output at one chosen scale, in this instance 
octave scale 5 (a nuclear size of 31 pixels across). 

For computational efficiency, the spatial resolution of the image is reduced prior to 
blob filtering, using the following thinning method. Each reduction in resolution by a 
factor of two (a "thin") is achieved by firstiy correlating the image with the 3x3 
20 smoothing filter (s), then sub-sampling by a factor of two. The formula for a single 
thin is: 

Tim, rO^Yiil + j,2/z + y ) * s(i + 2, y + 2) 

where / is the original image, 7 is the thinned image, and the indices m and n 
range from 1 to the dimensions of the thinned image. Each dimension of the 
_25 thinned image is given by subtracting 1 or 2 from the corresponding dimension of 
the original image, depending on whether this is odd or even respectively, then 
dividing by 2. 



In this instance, the image is reduced in resolution by a factor of 4, by applying the 
above process twice, firstiy on the original image, and then again on the resulting 
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thinned image, to produce a new image whose linear dimensions are a quarter of 
the size of the original (area is 1/16). For example, an original image of a tile of 
size 1476x1 160 pixels would become 368x289 pixels. Blobs are now extracted as 
described above from the reduced-resolution image at octave scale 3 (7x7 pixels 
5 across), this being equivalent to extracting scale 5 blobs from the original image, 
but being more computationally efficient. 

This process identifies all objects which fonn dark clusters of the requisite size, 
which may include not only mitotic epithelial cell nuclei but also bacl^ground clutter, 
stromal cells, lymphocytes and/or necrotic cells, plus fainter normal epithelial nuclei 

10 which are not of interest. Since it is known that the mitotic nuclei are likely to be 
much darker than average, only the darkest 10% of blobs are selected for further 
analysis. This is achieved by sorting the blobs into ascending order of filter output 
(so the darkest occur first), using a Quicksort algorithm (such as described in 
Klette R., Zamperoniu P., 'Handbook of Image Processing Operators', John Wiley 

15 & Sons, 1 996), finding the 10**^ percentile of the sorted values, and choosing all 
blobs darker than this percentile. 

Segmentation and first clutter rejection 

The next processing stage 22 aims to find an approximate segmentation of the 
image, to separate regions (defined as connected sets of pixels) potentially 
20 associated with the cell nuclei of interest, from the background. The full-sized 
original image (red component) is used at the commencement of this stage. 

Firstly, a grey-level threshold is selected. This is achieved by choosing a set of 
15x15 pixel neighbourhoods centred on each of the blobs selected at the end of 
stage 21, collating all pixels within all these neighbourhoods into a single list, and 
25 computing the mean grey-level of these pixels. 

A new thresholded binary image is now produced. Pixels in the red component of 
the original image whose grey levels are below (darker than) the threshold mean 
computed above are set to 1 ; remaining pixels are set to 0. 

Connected component labelling is now applied to this binary image. This is a 
30 known image processing technique (such as described In A Rosenfeld and A C 
Kak, 'Digital Picture Processing', Vols. 1 & 2. Academic Press, New Yoric, 1982) 
which gives numerical labels to connected regions in the binary image, these being 
groups of connected pixels whose values are all 1 . An 8-connectedness rule is 
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used, so pixels are deemed to be connected when they are horizontally, vertically, 
or diagonally adjacent. Each region corresponding to a selected blob from stage 21 
is assigned a separate label, enabling pixels belonging to those regions to be 
identified. The following region properties are then computed: 

5 Area = number of pixels within the region 

Thickness = minimum thickness of the region, defined thus: for each pixel In 
the region, find the minimum distance from that pixel to the outside of the 
region. Thickness is then defined to be the maximum of these minimum 
distances. Note that thickness is not the same as width; for a rectangle the 
1 0 thickness is half the width, and for a circle the thickness is the radius. 

Regions whose area is less than 190 pixels or whose thickness is less than 4 
pixels are rejected, these being too srhall to be mitotic cell nuclei. 

At this stage the mean grey-level of the pixels within each region is also calculated 
from the red component of the original image.' The overall mean and standard 
15 deviation of these mean grey-levels is then found for later use in grey-level 
normalisation (stage 25). 

Contour selection 

The next processing stage 23 incorporates two levels of contour selection to gain a 
better representation of the actual shape of the boundary of each remaining object 
20 at both low and high resolutions. Firstly, a low-resolution (large-scale) contour is 
computed, which gives an approximate shape, and secondly a high-resolution 
(small-scale) contour is found which gives a more accurate boundary 
representation. Following consistency checks between the two contours, attributes 
of the boundary are then measured from the small-scale contour. 

25 For each of the objects remaining after stage 22, a local region of interest (ROI) is 
selected. This ROI Is centra on the nominal centre of the object (as found in stage 
21), and has ao extent of 50 pixels in each direction, the region size being 
truncated as necessary to ensure the ROI lies within the bounds of the original 
image. This allows ROls which would othenwise overlap the edges of the image to 

30 be included. Alternatively the ROls could be defined by taking the regions 

Identified In stage 22 and adding a border of a selected number of pixels. In either 
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case, it is desirable tliat the ROis exceed tiie size of those regions somewhat in 
order to ensure the generation of the low-resolution contours. 

To find a low-resolution representation for the boundary of each object, the region 
defined by the ROI above is used to define a sub-image within the output of the 
5 blob filter (stage 21 ). This sub-image will consist of both positive and negative grey 
levels. Contours at two levels within this sub-image are then sought, namely at 
levels 0 and -10 which have been found to be best experimentally. By virtue of the 
operation of the blob filter In stage 21 , the zero level contour in the respective sub- 
image is that contour which exhibits the highest edge strength. A contour is a 

10 curve consisting of points of equal value for some given function; in this case the 
function is defined by the grey-level pixel values. In this embodiment, the Matlab® 
contour function is employed but any contouring algorithm can be used which 
returns contours in the same form, as a set of contiguous points ordered around 
the contour; (Matlab® is a well known computational tool from The MathWorks, 

15 inc.). Matlab® returns a set of locations with sub-pixel resolution which are in order 
of location abound the contour, i.e. traversing the set of locations is equivalent to 
walking around the contour. Contours are only treated as valid if they satisfy all the 
following four conditions: 

- they form closed loops within the ROI, i.e. the last contour point is the 
20 same as the first contour point; 

- they are consistent with the location of the object (there is at least one 
contour point whose distance from the nominal centre of the object is 
less than or equal to 30 pixels); 

- they have a sufficiently similar area to the "nominal area" found from the 
25 grey-level segmentation computed in stage 22 (the definition of the area 

within a contour is given later in this section). The contour area must be 
at least 50% of the nominal area; 

- they have the correct grey-level orientation, namely pixels within the 
contour arfe darker than those outside the contour. 

30 The object is retained for further analysis (maintained in list in computer) only if a 
valid contour is found from at least one of the two contour levels (0 and -10). If both 
contour levels yield a valid contour, then the latter one (-10) is chosen for further 
use. 
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To find a high-resolution representation, the region defined by the ROI above is 
taken out from the red component of the original image to form a sub-image. 
Contours are not extracted from the image at its original resolution, because these 
have been found to be too rough. Instead, the resulting sub-image Is expanded In 
5 size by a factor of two, using the Matlab® bilinear interpolation function, to give 
additional resolution. In bilinear interpolation, to find the values of a selected point 
not on the original image grid, its four nearest grid points are located, and the 
relative distances from the selected point to.each of its four neighbours are 
computed- These distances are used to provide a weighted average for the grey- 
10 level value at the selected point. 

This interpolated image is then smoothed before contouring, by correlating (see 
eariier description with reference to stage 21) with a 3x3 smoothing filter (s) 
defined thus: 

n 2 O 



15 



1 



2 4 2 
1 2 1 



Valid contours are then sought at each of several threshold levels. The range of 
threshold levels starts at the minimum grey-level within the sub-image, and 
increases up to the maximum grey-level in steps of 10 grey levels, so the actual 
' contour levels are set adaptively. Valid contours are defined in the same manner 
20 as for the low-resolution boundary above. 

Having found a set of valid contours at each threshold level, the edge strength at 
each point on the contour is estimated. The edge strength at each image pixel is 
defined as the modulus of the vector gradient of the original red component of the 



image /, where the vector gradient is defined asgrad/ = 



dl dA 



, where the two 



^dx dy) 

25 partial derivatives are obtained from taking differences In pixel grey-level values in 
the X and Y directions respectively. The edge strength at contour points which He 
between image pixels is estimated using bilinear interpolation (as described above) 
from the nearest pixel values. The mean edge strength around the contour is then 
computed. The contour having the greatest edge strength is chosen as being the 

30 most representative of the boundary of the object. If no valid contours are found, 
the object is rejected. 
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10 



15 



20 



25 



A consistency chedc between the low-resolution and high-resolution contours is 
then performed. The area within each contour is computed from the boundary 
contour using Green's Theorem (such as described in Chap 6 in Vector Analysis 
and Cartesian Tensors', D.E.Boume and P.C.Kendal, 2"** ed, Nelson, 1977). This 
gives the following formula for area: 



where {x^yyi) arfe the contour points. 

These areas are then subjected to the following tests: 

High-resolution area > 0.6*low-resoiution area 

High-resolution area < 1 .4* low-resolution area 
If either of these tests fail, the object is rejected. 

Finally, there Is a check that the low- and high-resolution contours for each object 
overlap sufficiently. For each object, two binary images are fonmed. The first binary 
image is formed by setting the value of pixels which lie within the low-resolution, 
contour to 1 , and those pixels outside the contour to 0, using the Matlab® function 
roipoly. The second binary image is formed in the same way from the high- 
resolution contour. The absolute difference of these two images is taken, resulting 
in another binary Image in which pixels are set to 1 if and only if they lie within one 
of the contours and not the other, and 0 othenvise. Connected component labelling 
(see stage 22 description) is applied to this new binary image to identify separate 
regions. The thickness of each of these regions is computed (as in stage 22). If any 
region thickness exceeds 5 pixels, the corresponding object is rejected. 

Simplified visualisations of the effects of the stage 23 processing are shown in 
Figures 3 and 4. 

In Figure 3(a) a local region of interest 30 is defined around the nominal centre 31 
of an object 32 which is represented In this Figure by a series of contours. A 
second object 33 also appears in the same sub-image. Figure 3(b) illustrates the 
low-resolution boundary contour 34 computed for the object 32 and Figure 3(c) the 
high-resolution boundary contour 35 computed for the same. Figure 3(d) 
illustrates the overlap between these two resolutions with the region of difference 



n-l 
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36 shaded. In this case the areas within the contours 34 and 35 are sufficiently 
similar and the thickness of the region 36 is sufficiently small to pass all the above- 
mentioned consistency checks and the object 32 will be retained. In effect these 
checks are showing that the object is sufRciently unifonnly dark to potentially 
5 represent a mitotic cell nucleus. The object 33 will not be treated as valid for the 
ROI 30 because Its contours are not consistent with the centre 31 . It will, 
however, be separately analysed within a separate ROI (not shown) defined 
around its own nominal centre. 

In Figure 4(a) there is another example of a local region of Interest 40 defined 
10 around an object 41 . Figure 4(b) illustrates the low-resolution boundary contour 
42 computed for this object and Figure 4(c) the high-resolution boundary contour 
43. In this case the area of the contour 43 is substantially less than that of the 
contour 42 (approximately 0.4) so it fails the first of the above-mentioned 
consistency checks and the object 41 will not be processed further. This is 
15 indicative of a normal epithelial cell nucleus which has a relatively darker nucleolus 
(chosen as the high-resolution boundary because of its high edge strength) 
surrounded by a less dark region (chosen as the low-resolution boundary). 

Boundary tracking 

The next processing stage 24 applies a tracking algorithm to the high-resolution 
20 contour representing the object's boundary for each object retained from the 
previous stage 23 in order to estimate curvature. The aim is to smooth the 
boundary and then measure curvature, because simply calculating curvature from 
the contour segments gives too rough a measurement. For the identification of 
mitotic cell nuclei the degree of non-convexity of the boundary is of interest, so the 
25 latter method of calculation is inappropriate. 

The particular algorithm which has been used in the preferred embodiment is 
based on a Probability Density Association Filter (PDAF), such as described in 
Y. Bar-Shalom and T.E.Fortmann, "Tracking and Data Association", Mathematics in 
Science and Engineering series, vol 179, Oriando, Fl., Academic Press, 1988. This 
30 type of tracking algorithm is designed to estimate the parameters of a chosen 

object (target) in the presence of other measurements which are not related to the 
object in question (noise and clutter). In this case the 'targef state variables are the 
position, orientation and curvature of the object's boundary, and the measurements 
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are the positions of the contour points and the orientations of the lines joining each 
pair of contour points. 

The PDAF filter requires a model for the dynamics of the boundary state. The 
boundary dynamics is given by a constant curvature (the radius of cun/ature is set 
5 to 10 pixels) plus an assumed random perturbation known as system noise. This 
noise is determined by the variance of curvature perturbation, which is chosen 
according to how irregular the boundary of a mitotic cell nucleus is expected to be. 
In the preferred embodiment the curvature variance is 9 for position and 0.09 for 
angle (in radians). 

10 As a starting point, it is assumed that for each object potentially representing a 
mitotic cell nucleus a connected set of edge features has been extracted from the 
image. In this case, edge features are line segments joining two adjacent contour 
points. Each edge feature has the following measurements that were made as part 
of the contour extraction process: 

15 Position Xm, Ym (horizontal and vertical image coordinates) of the centre.of 

the edge 

• Orientation Om, i.e. the angle between the edge and the horizontal. 

The purpose of the tracker is to estimate the most likely true location, orientation . 
and curvature Xs, yi, 6^s,A:of the boundary at each point from the above 
20 measurements, given that there are measurement errors with an assumed 
Gaussian distribution. The following information vectors are defined: 

• The measurement vector z = (x^, ym, ffnjl 

• The system state vectbr x^ (Xs,ys, Os^k). 

To use the PDAF filter to do this, the following information about the true boundary 
25 and the measurement process is required: 

• The relatloriship between the position, orientation and curvature of 
neighbouring points on the boundary (the system model). This Incorporates 
a transition matrix <2> linking neighbouring states x and a system noise 
model that adds extra random perturbations to x. 



wo 2004/055733 



16 



PCT/GB2003/005445 



• The relationship between the measurement vector z and the system state 
X. This incorporates a transition matrix H linking x to z and a measurement 
noise model that adds extra random perturbations to z, 

• It is assumed that not all of the edge features are associated with the 
5 nuclear boundary; the ones that are not are denoted clutter. 

In its most general form the PDAF processes several measurements z at each step 
1n estimating x. In this case only one edge feature is processed at a time, so' there 
are only two hypotheses to be tested; either the feature is from clutter or from the 
real nuclear boundary. 

10 The system transition matrix <P is based on constant curvature, so to predict a 
neighbouring system state the unique circle or straight line with curvature re, 
tangent slope 0s going through the point Xs, Vs is extrapolated to the point that is 
closest to the next measurement point. 

The system noise has a Gaussian distribution with zero mean and a covariance 
15 matrix based on Independent perturbations in curvature, orientation and lateral 
offset (movement In a direction normal to the boundary). A Brownian model is 
used, where the standard deviations of perturbations in curvature, orientation and 
lateral offset are proportional to the square root of the arc length of the 
eXtlrapolated circle of the previous paragraph. The accumulated effect of curvature 
20 perturbation on orientation and lateral offset is also modelled, resulting in the 
following covariance matrix: 

with respect to the system curvature k, system slope ^and lateral offset 
respectively, where s is the arc length (the circular distance between the previous 
25 . _ point and the estimate of the next point). The constants ojc. cts^ o-sy define the 

average roughness of the nuclear boundary, and depend on the type of cell being 
analysed. 

The measurement transition matrix H maps the system parameters to the 
measurement parameters in the natural way: 



3"^ S'^ 



Iv^ JLc^ 



0 0 
0 crj 
0 0 



0 
0 



sv J 
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0. 



ri 0 0 0^ 

0 10 0 
v^O 0 1 0 



The measurement noise is based on Independently Gaussian distributed 
perturbations of slope and lateral offset, resulting in the following covariance matrix 
with respect to measurement slope and lateral offset respectively: 



'me 



'my J 



The constants cr^^^, o?ny define the average smoothness of the nuclear boundary, 
and depend on the type of cell being analysed. 

The following constants are used to define the clutter model: 

• /> = Clutter density = average number of edges per unit area that are not 
1 0 associated with tiie nuclear boundary. . 

• Pd = Probability that the edge feature is associated with the true nuclear 
boundary. 

These constants depend on the clutter present in the image, both its edge strength 
relative to the nuclear boundary and its average spatial density. 

15 For a nucleus with average radius r the following parameters of the above model 
are used (all units are image pixels): 



Ok 



• Omy = 3 



• crm0 = 0.3 radians 



20 



Osy = 0.9 



cr^e- 0.09 radians 



>9=0.01 
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• Pd = 0.8 

The initial value for the system covariance matrix M is given by: 





ftty 


0 


0^ 






0 


_2 


0 


jWherecTjto =l/r 






0 







The PDAF estimator is now applied sequentially as follows. The matrices Hk, Qk 
5 and Rk are all constant In this application (as defined above). The following 

expressions are those known from the Bar-Shalom and Fortroann reference quoted 
above. 

Update: 

INNOVATION 



10 i-J 
KALMAN GAIN MATRIX 



Hlk =Z>^« i^M ^^^^ 3^ = Zki-HkX, 



K k =M k H\ S'k where S ic=H k M k H\ + R k 



BETA WEIGHTS 



Nk 



e«/[b + X;e«] for i^O 



Nk 



b /[ b + 2e« ] for i = 0 



where ew =e3cp(-KvaSk ) fori^'O, and b= yo( 1-Pd)/|27s7|/Pd2 



15 STATE ESTIMATE UPDATE 



ERROR COVARIANCE UPDATE 
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where 



P* =[ I - Kk Hk ] Mk 



Prediction: 

STATE ESTIMATE EXTRAPOLATION 

ik^i =Ok Xk 
ERROR COVARIANCE EXTRAPOLATION 



Mk+i =Ok PkO\ + Qk 



This process continues until the entire contour Is traversed (i.e. returned to the 
starting point). This Is then repeated around the contour for a second time (using 
the final conditions from the first pass as starting conditions for the second pass); 
1 0 this ensures that the final estimate of the smoothed contour is independent of the 
assumed initial conditions. 

The curvature of the smoothed contour derived from the PDAF tracker Is now used 
to find a measure of the degree of non-convexity of the object's boundary. Firstly, 
the sign of the curvature is set so that it is positive where the boundary Is locally 

1 5 convex and negative where locally concave (as viewed from outside the boundary). 
All positive values of curvature are then set to zero, leaving non-zero values only at 
locations where the boundary is locally concave. A graph of curvature (Y-axis) 
against perimeter arc length i.e. distance along the boundary (X-axis) is plotted, 
th^n the line integral of curvature with respect to arc length is computed. The 

20 absolute value of this integral is taken to produce a non-negative resulL The finai 
result is a dimensionless quantity giving an indication of overall non-convexity, 
called the "negative curvature area". Objects which are almost completely convex, 
in this case whose negative curvature area is less than 0.2, are then rejected. 
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The output from this process is a set of boundary measurements for each object, 
namely negative curvature area, and a more precise estimate of area. 

Grey-level nonmalisation 

Next a normalisation process 25 is carried out to allow for differences in overall 
5 brightness between different slides. For each remaining object, the mean grey level 
of the pixels enclosed within (but not on) the high-resolution contour found in stage 
23 is calculated. The statistics used for normalisation are the overall mean and 
standard deviation of the grey levels of the regions obtained from stage 22. Each 
object's grey level is then normalised (by subtracting this mean and dividing by this 
10 standard deviation). The output is a statistic for each assumed nucleus. 

Second clutter rejection 

The next process 26 involves a second stage of classification and clutter rejection 
based on the Fisher classifier to discriminate between objects representing mitotic 
and non-mitotic nuclei. The Fisher classifier is a known statistical classification 
15 metliod described for example in Section 4.3 of "Statistical Pattern Recognition" by 
Andrew R. Webb, Arnold Press, 1999, and is preferred for this stage of the process 
due to its robustness against overtraining. 

In this case the Fisher classifier uses a set of information about each object that 
has been derived by analysis as described above. Each information set is a feature 

20 vector, that is an ordered list of real numbers isach describing some aspect of the 
object; each component number is denoted an object feature. The purpose of the 
classification algorithm is to discriminate between two classes of object based -on 
the information contained In their feature vectors. The output of the algorithm is a 
set of numbers, one for each object, indicating the likelihood that the nucleus which 

25 it represents is a member of one of the two chosen classes (in this case the 
classes are mitotic and non-mitotic). 

For a given feature vector x, the standard implementation of the Fisher classifier 
output is defined as: 

n 

ib>l 
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where x=[x^, ...xj is the feature vector. In this embodiment, this definition has been 
extended to use non-linear functions of the feature vector, namely: 

where are prescribed real numbers and g^c are prescribed functions of the 
5 feature vector x. These functions and variables are chosen to give the lowest 
number of misclassifications for oiijects with l<nown class. " 

The components of the feature vector x are mean grey level (computed in stage 
25) and negative curvature area (computed in stage 24). In principle area could 
also be used, since smaller mitotic cell nuclei tend to be darker and less concave 
10 than larger ones. In this case a quadratic set of basis functions (g^) are used, so 
that the Fisher classifier value is given by: 

F ^a^G'¥a^G^ + a^C ^ a^GC a^C^ 

where G is the normalised grey-level, C is the negative curvature area, and the 
coefficients a/ are derived from the training stage referred to below. 

15 The coefficients and decision boundary for the Fisher classifier are obtained by 
training the classifier on a large number of example slides provided by a 
histopathologist where accurate ground truth (sets of mitotic and non-mitotic cells) 
is also available. The training stage results in a classifier boundary which 
minimises the total number of misclassifications, i.e. both false negatives (missed 

20 mitotic cells) and false positives (falsely-detected noh-mitotic cells). In the preferred 
embodiment the resulting coefficients a/ that have been derived from this training 
stage are [-0.87431, 0.10205. 0.84614, -0.18744, -0.04954, -5.56334]. Figure 5 
illustrates the classifier together with the data on which it was trained, where 
plusses indicate mitotic cells and crosses Indicate non-mitotic cells. 

25 Mitosis count 

Stage 27 counts the number of objects deemed to represent the nuclei of mitotic 
cells, that is to say only those objects whose values exceed a given threshold in 
the output of the Fisher classifier. The preferred criterion is F>0 (illustrated as the 
decision boundary in Figure 5), set to give the optimum trade-off between missed 
30 mitotic cells and falsely-detected non-mitotic cells. The number of objects whose 
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classifier value exceeds this threshold defines the mitotic count for that tile. The 
count for the ten tiles analysed is aggregated, and can be converted into an 
indication of cancer grade in accordance with a table as previously described in 
connection with the existing manual procedure. 

It will be appreciated that the coding of a computer program to implement all of the 
processing stages described above for the preferred embodiment of the invention 
can be achieved by a skilled programmer in accordance with jconventional 
techniques. Such a program and code will therefore not be described further. 
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CLAIMS 

1 . A method for the automated analysis of a digital image comprising an an^ay 
of pixels, including the steps of: 

(a) identifying the locations of objects within the image which have 
5 specified intensity and size characteristics; 

(b) defining regions of specified extent within the image which contain 
respective said objects; 

(c) deriving from the data within respective said regions one or more 
respective closed contours comprising points of equal intensities; and 

10 (d) estimating the curvature, of at least one respective said contour 

within respective said regions at least to produce a measure of any concavity 
thereof. 

2. A method according to claim 1 wherein step (a) comprises the application of 
15 a radially-symmetric difference filter with zero mean. 

3. A method according to claim 2 wherein the image is filtered at a plurality of 
resolutions of increasing scale. 

20 4. A method according to claim 2 or claim 3 wherein said locations are 

identified in accordance with the locations of respective local extrema in the output 
of said filter. 

5. A method according to claim 4 including the step of sorting, in order of 

25 intensity, local extrema in the output of said filter and selecting for further analysis 
only those objects which correspond to a specified proportion of said extrema in 
such order. 

6. A method according to any preceding claim further comprising, following 
30 step (a): 

selecting an intensity threshold related to the mean intensity of pixels within 
the image in neighbourhoods of said locations; 

creating a binary image according to whether pixels in the first-mentioned 
image are above or below said threshold; 
35 identifying regions in the binary image composed of connected pixels which 

are below said threshold in the first-mentioned image; and 
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rejecting from further analysis those objects which correspond to such 
regions in the binary innage which fall below a specified size or thickness. 

7. A method according to any preceding claim wherein step (c) comprises, for 
5 respective said regions, deriving respective first and second said contours having 
respectively lower and higher resolutions, determining whether the sizes and 
locations of said first and second contours are consistent within specified criteria 
and, if so consistent, selecting said second contour for step (d). 

10 8. A method according to claim 7 wherein, for respecfive said regions, the first 
said contour is derived by: 

seeking within the region one or more contours of respective specified 
intensities; 

determining whether the or each such contour Is a closed contour and 
15 meets specified location, size and/or Intensity orientation criteria; and 

if more than one such contour is a closed contour and meets such criteria, 
selecting from the same the contour of the lowest intensity. 

9. A method according to claim 8 wherein said specified intensities are no 

20 greater than that which corresponds to the contour of highest edge stijength within 
the respective region. 

10. A method according to claim 9 when appended to any one of claims 2 to 5 
wherein said first contour is derived by seeking one or more contours in the output 

25 of said filter for the respective region and said specified intensities are no greater 
than the zero level in such output. 

11. A method according to any one of claims 8 to 10 wherein, for respective 
said regions, the second said contour is derived by: 

30 seeking within the region a plurality of contours of respective specified 

intensities ranging between the lowest and highest intensities within the region; 

determining whether each such contour is a closed contour and meets 
specified location, size and/or intensity orientation criteria; and 

if more than one such contour is a closed contour and meets such criteria, 
35 selecting from the same the contour having the highest edge strength. 
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12. A method according to any preceding claim wlierein step (d) includes the 
application of a Probability Density Association Filter to respective said contours. 

13. A method according to any preceding claim wherein step (d) comprises, for 
5 respective said contours: 

measuring the curvature of the contour at a plurality of points around the 
contour, convexity and concavity being of opposite sign; 
setting convex values of such curvature to zero; 

plotting resultant values of curvature at said points against a measure of the 
10 distance of the respective point along the contour; and 

computing as said measure of concavity the line integral of such plot. 

14. A method according to any preceding claim further comprising the step of: 

(e) classifying objects into one of at least two classes in accordance with a 
15 function of said measure of concavity of a contour corresponding to the respective 

object and a measure of the mean intensity of the respective object. 

15. A method according to claim 14 wherein step (e) is perfonmed by use of a 
Fisher classifier. 

20 

16. A method according to claim 14 or claim 15 wherein the intensities of 
respective objects are normalised prior to step (e). 

17. A method according to any one of claims 14 to 16 further comprising the 
25 step of: 

(f) counting the number of objects classified into a specified one of said 
classes. 

18. A method according to any preceding claim for the automated analysis of a 
30 digital image of a histological or cytology specimen. 

19. A method according to claim 18 wherein the image is of a section of breast 
tissue. 

35 20. A method according to claim 18 or claim 19 when appended to claim 17, 
wherein said specified class is identified as the class of mitotic epitiielial cell nuclei. 
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21 . A method according to any one of claims 1 to 17 for the automated analysis 
of a digital image of a soil sample. 

22. A method for the automated identification of mitotic activity from a digital 
5 image of a histological specimen, including the steps of: 

(a) identifying the locations of objects within the image which have 
specified intensity and size characteristics associated with epithelial cell nuclei; 

(b) defining regions of specified extent within the image which contain 
respective said objects; 

10 (c) deriving from the data within respective said regions one or more 

respective closed contours comprising points of equal intensities; 

(d) estimating the curvature of at least one respective said contour 
within respective said regions at least to produce a measure of any concavity 
thereof; and 

15 (e) classifying objects as representing mitotic cell nuclei as a function of 

at least said measure of concavity of a contour corresponding to the respective 
object. 

23. Apparatus for the automated analysis of a digital image comprising means 
20 adapted to perform a method according to any preceding claim. 

24. A computer program product comprising a computer readable medium 
having thereon computer program code means adapted to cause a computer to 
execute a method according to any one of claims 1 to 22. 

25 

25. A computer program comprising instructions to cause a computer to 
execute a method according to any one of claims 1 to 22. 
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(57) Abstract: A method for the automated 
analysis of digital images, particularly for the 
purpose of assessing mitotic activity from images 
of histological slides for prognostication of breast 
cancer. The method includes the steps of identifying 
the locations of objects within the image which 
have intensity and size characteristics consistent 
with mitotic epithelial cell nuclei, taking the 
darkest 10 % of those objects, deriving contours 
indicating their boundary shape, and smoothing 
and measuring the curvature around the boundaries 
using a Probability Density Association Filter 
(PDAF). The PDAF output is used to compute a 
measure of any concavity of the boundary - a good 
indicator of mitosis. Objects are finally classified 
as representing mitotic nuclei or not, as a function 
of boundary concavity and mean intensity, by use 
of a Fisher classifier trained on known examples. 
Other uses for the method could include the analysis 
of images of soil samples containing certain types 
of seeds or other particles. 
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